Acute myeloid leukemia (AML) remains a clinical challenge with limited targeted therapies and dismal clinical outcomes. We previously identified Interferon Regulatory Factor 2 Binding Protein 2 (IRF2BP2) as a selective AML dependency, demonstrating that its loss induces cell death in AML patient cells, while sparing colony-forming capacity in healthy donor-derived CD34⁺ bone marrow cells. These findings suggest a potential therapeutic window for targeting IRF2BP2 in AML.

However, no IRF2BP2-specific inhibitors or degraders are currently available for clinical testing, in part due to the protein's structure, which lacks an easily targetable domain. To address this limitation, we developed a computational strategy that integrates machine learning with transcriptomic data to identify upstream druggable regulators of IRF2BP2 expression in an unbiased manner.

We first trained over 90 neural network architectures on publicly available bulk RNA sequencing data sets from human AML samples (identified by using a curated list of terms related to myeloid biology) in a supervised machine-learning framework to predict IRF2BP2 expression levels. These architectures varied in depth, number of hidden units, activation functions, and regularization strategies. We evaluated performance using the coefficient of determination (R²), which works well for predicting expression levels and is easy to interpret, because it shows how much of the variation in IRF2BP2 expression is explained by the model beyond simply reporting the loss. We next leveraged our trained models to identify genes with the greatest influence on the transcript per million predictions of IRF2BP2. Gene-level attribution scores derived from these models were used to assess regulatory influence, and a confidence metric ranked consistent candidates. By assessing the impact of each gene on the model's output, we identified those genes that were most strongly associated with either increasing or decreasing IRF2BP2 expression. These genes represent potential positive or negative regulators of IRF2BP2 expression.

To complement our neural network-based analysis, we then implemented a linear regression model as an independent approach to identify regulators of IRF2BP2 gene expression. This model predicts IRF2BP2 expression as a weighted sum of all other gene expressions, offering interpretability through direct coefficient analysis.

To increase confidence in our predicted regulators, we performed an integrative analysis to identify genes consistently highlighted across the neural network and linear regression model. This step was designed to refine our candidate list by prioritising genes, predicted as the most robust regulators of IRF2BP2, supported by multiple model types, thereby strengthening the rationale for experimental validation. We focused our following analyses on positive regulators of IRF2BP2 expression as their decreased expression is predicted to decrease IRF2BP2 expression, subsequently inducing the biologically relevant phenotype of AML cell specific cell death.

Among the top candidates, insulin receptor substrate 2 (IRS2) emerged as a robust positive regulator of IRF2BP2, thus predicting that reduced expression of IRS2 should lead to decreased IRF2BP2 expression. To test our model, we treated AML cell lines and patient derived AML cells with NT157, a selective IRS1/2 inhibitor, known to decrease IRS2 expression. Upon treatment with low NT157 concentrations we found decreased IRF2BP2 protein expression after three hours, phenotypically followed by a reduction in cell viability and increase in Annexin/PI positive cells as a surrogate marker of apoptosis.

Our study presents a machine learning-guided framework for the identification of transcriptional regulators of biologically validated cancer dependencies. Importantly, the experimental validation of one predicted regulator underlines the biological relevance of our computational strategy. This approach may facilitate the discovery of therapeutic entry points for biologically hard to characterize or otherwise intractable targets across diverse biological contexts.

This content is only available as a PDF.
Sign in via your Institution